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0 Data storage using a cache. 



0 Data in pages is mapped into a very large virtual 
external address space (25) through a cache without 
disturbing the logical view of the data and without 
having to assign physical or real backing store to 
said logical view. A data cache (27) is used in which 
pages are indexed according to a logical address 



(23) intermediate to their virtual address and their 
physical location in external storage (5). Pages com- 
mon to two or more files are updated in place in the 
cache, while pages bound to only one file are shad- 
ow copied. 
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DATA STORAGE USING A CACHE 



This invention relates generally to data storage 
and, nnore particularly, to the type of data storage, 
sonnetinnes referred to as cache storage, used in 
conjunction with a central processing unit (CPU) to 
expedite the processing of instructions. 

A typical data processing machine comprises 
an Instruction processor coupled to a hierarchically 
organised and least recently used (LRU) ordered 
storage system containing software and data. The 
fastest, most rapidly accessed storage is posi- 
tioned closest to the instruction processor. Also, it 
is placed at the top of the hierarchy. Progressively 
slower forms of storage, containing the bulk of 
information, occupy lower positions within the hier- 
archy. 

Because storage costs increase dramatically 
with speed, many computer systems divide the 
physical storage subsystem into a number of per- 
formance levels. Some of these levels, such as 
direct access storage devices (DASD) and tape, 
have been treated as peripheral I/O devices and 
are accessed over an asynchronous path. Other 
levels, such as random access memory (RAM) and 
cache, have been treated directly by system hard- 
ware and accessed over a synchronous path as 
part of internal storage. 

The tenm "internal storage" is customarily ap- 
plied to that portion of storage randomly addres- 
sable for single read or write transfers. In IBM 
systems, internal storage is byte addressable ex- 
cept for an extension ("expanded store"). Expand- 
ed store is randomly accessed on a block or page 
addressable (4096 bytes/page) basis. It is managed 
as an LRU real memory backed paging store. Simi- 
larly, "external storage" is applied to that bulk 
portion of storage that is not randomly addressable 
and must be directly accessed, as on DASD. 

An internal store is deemed "synchronous" 
when a referencing processor idles until a retum is 
received. Generally, if the data being sought re- 
sides in external store (beyond a point called the 
"I/O boundary"), a referencing processor will 
search for another task to perfomi instead of wait- 
ing. This task switching is disnjptive in that a 
retrieval path must be established to the new data, 
and the processing state of the prior task must be 
saved. When the retrieval from external storage has 
been completed, it is again necessary to switch the 
CPU back to the fomner task. 

Cache and Cache invalidate 

A "cache" is typically an indexable LRU-or- 
d red collection of pages in a buffer positioned in a 
path to data or Instructions so as to reduce access 



time. The term "cache invalidate" refers to either 
removing from the cache or providing indication 
that a named page is invalid, for example following 
a change to the base page on some other data 
5 path so that the version in the cache is no longer 
accurate. 

A processor or CPU system typically includes 
an operating system, a local cache operatively 
formed from processor internal memory, DASD- 

70 oriented external store, and storage protection 
(lock) and cache resource managers. Processes 
executing on a CPU generate read and write oper- 
ations by way of the operating system. In turn, the 
read and write operations utilise the cache and lock 

75 resource managers to establish directory protec- 
table access paths to pages currently resident in 
cache or as refreshed into cache from the ^eternal 
store. 

"Virtual storage" involves the addressing of a 

20 storage space much larger than that available in 
the internal storage of a CPU. CPU processes tend 
to reference storage in nonuniform, highly localised 
patterns, making it possible for a small amount of 
real storage, properly managed to provide effective 

25 access to a much larger amount of virtual storage. 
If the referenced data is not available in internal 
storage, then new pages are swapped in from 
external storage, a process refen^ed to as 'paging'. 
The capacity of the system to manage pages 

30 is determined largely by the number of slote or 
"page frames" set aside in internal store for pag- 
ing. If the sum of the subsets of pages referenced 
by processes exceeds the number of page frames 
in Internal storage then at some stage it will be 

35 necessary to access extemal storage, a require- 
ment known as faulting. 

In a large data processing system a number of 
processes are typically running at any given time. 
Each process references its own subsets of pages, 

40 conventionally under the control of a part of the 
operating system refen-ed to as a virtual demand 
paging system. In such a system the pages are 
usually stored in an associative store enabling a 
process to identify and access a desired subset by 

45 an associative tag or filename. 

Ambiguity arises in such a system where two 
different filenames are used to access the same 
physical page. These "synonyms" are wasteful of 
cache space and create a cache Invalidate problem 

60 since the cache manager usually has no way of 
associating the many possible names for the same 
data. 

This difficulty is addressed In U.S. Patent 
4.612,612 entitled "Virtually Addressed Cache", is- 
sued September 16. 1986. by treating the each as 
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a virtual addressable entity. However, since in the 
system described in thus patent pages are cached 
by their virtual addresses, each page is treated 
independently, even though two virtual page ad- 
dresses may reference the same ultimate address 
in real storage. Also, there is a finite possibility in 
such a system that virtual addresses in different 
address spaces are mappable to the same real 
address. 

It Is accordingly an object of this invention to 
provide an Impaired arrangement for managing the 
access to pages mapped into a very large virtual 
external address space through a cache which ef- 
fectively reduces the problem presented by syn- 
onyms. 

According to the invention we provide a meth- 
od for accessing data in a data processing system 
having a processor, intemal storage organised as a 
data cache formed from addressable pages, and 
external storage addressable to access multiple 
pages associated In files, the method comprising 
the steps of referencing pages in a given file ac- 
cording to their addresses in a linear space as 
mapped into a virtual extemal storage address 
(VESA) and then as mapped into a physical ad- 
dress in external storage, and writing referenced 
pages Into the cache using their VESA addresses 
as Indexing arguments if not otherwise located in 
said cache, and, In response to a write request 
from said processor, updating in place those 
cached pages common to two files, otherwise 
shadow copying updated pages into another cache 
location using another VESA address. 

We further provide a data processing system 
comprising a processor having internal storage 
formed from RAM-addressable pages and external 
storage fbnmed from DASD-addressabie pages, 
characterised by a cache adapted to assign device 
independent locations in a logical external storage 
space (VFO), (VF1) to pages (AV1P0. AV1P1) to be 
accessed, to assign, in response to an update 
(AV2P0, AV2P1') of a page not comnnon to two 
files of associated pages, a further logical extemal 
storage space (VF2 for AV2P1') and, in response to 
an update of a page common to two files, to 
update such page in place in the cache, or other- 
wise to write a shadow copy thereof in cache 
assigning yet another logical external storage 
space (VF2') thereto. 

In contrast to the aforementioned US patent 
No. 4,612,612, the invention uses an additional 
layer of indirection (VESA), i.e., pages are indexed 
in cache by their VESA arguments, avoiding syn- 
onym conflict. This requir s nriapping to external 
storag via logical to VESA and VESA to r al. 

Advantageously, the method of this Inv ntion 
(a) generates a unique name for caching and 
avoids synonymy; (b) uses a unique name for 



locking; (c) stores data in cache and writes it out 
only upon change; and (d) if location of a page in 
real storage changes, then the cache is not invali- 
dated because the logical address remains the 
5 same (invariant), and (e) physical backing for the 
virtual file is not required. 

Brief Description of the Drawing 

70 Fig. 1 sets out the organisation of storage in 
relation to a large main frame CPU. 

Fig. 2 conceptually depicts virtual-to-real ad- 
dress translation, associative memory assist, and 
cache placement according to the prior art. 
T5 Fig. 3 shows a concept to virtual caching ac- 
cording to the prior art. 

Fig. 4 depicts software caching and its place- 
ment according to the Invention. 

Fig. 5 sets forth the manner by which the 
20 synonym problem is avoided using the VESA-or- 
dered pages in a cache according to the invention. 

Fig. 6 illustrates updates in place and shadow 
copying. 

Rg. 7 is another mapping example involving 
25 several different views according to the invention. 

The invention can be convenientiy practised in 
a general purpose computer such as an IBM/360 or 
370 architected CPU having an IBM MVS operating 
system. An IBM/360 architected CPU is fully de- 
30 scribed in Amdahl et al., U.S. Patent 3,400,371, 
"Data Processing System", issued September 3, 
1968. 

A typical MVS operating system is described 
in IBM publication GC28-1150, "MVS/Extended Ar- 

35 chitecture System Programming Library: System 
Macros and Facilities". Vol. 1. 

In this description the term 'page' will be used 
to designate a block of bytes. The number of bytes 
in a page is typically 4096. 

40 Rg. 1 shows the relationship of organised stor- 
age to tiie central processing unit 1 (CPU) in a 
computer such as that referenced above. CPU 1 
can access both internal storage 3 and extemal 
storage 5 over paths 11 and 13. Internal storage 3 

45 includes processor storage 2, whose contents are 
byte addressable and randomly accessible, and 
expanded storage 4. whose contents are page ad- 
dressable and randomly accessible. Extemal stor- . 
age 5 comprises one or more DASDs and stores 

60 multiple pages of information referenced by ap- 
plications executing on CPU 1. 

Typically, an application invoking the CPU pro- 
cessor references a page by eitiier its virtual/linear 
or real space address to a cache 9. Cache 9 may 

55 be hardware or software implemented. If software 
Implem nted, tiie cache could be located anywhere 
in internal storage 3. if the page is not available in 
cache 9, theri either expanded storage 4 or xter- 
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nal storage 5 needs to be accessed. 

Where multiple pages are accessed across the 
I/O boundary 7 in external storage, they may be 
processed according to methods as set forth, in 
Luiz et al., U.S. Patent 4,207.609, "Path Indepen- 
dent Device Reservation and Reconnection in a 
Multi-CPU and Shared Device Access System", 
Issued June 10. 1980. When an access is made to 
internal storage, the processor waits until the ac- 
cess is completed. When access is made across 
the I/O boundary, the processor invokes another 
task or process while awaiting fetch (access) com- 
pletion. 

Address Translation and Cache Placement 

Referring now to Fig. 2, there is conceptually 
depicted virtual-to-real address translation, associ- 
ative memory assist, and cache placement accord- 
ing to the prior art. 

As shown in Fig. 2, row (1). the conversion of a 
virtual address to a real address is usually imple- 
mented in hardware or fast microcode and involves 
an address translation or mapping. In a typical IBM 
System/370 machine, the address translation 
mechanism will decompose a virtual address into a 
page address and a relative page displacement. 

As previously mentioned internal storage set 
aside in support of paging is organised into fixed 
locations called page frames. A page table may be 
used to correlate a virtual address reference in a 
program and the real address of a page frame in 
internal storage. The effective page address can be 
ascertained by adding the relative address to the 
page frame location. A further discussion may be 
found in Lorin and Deitel, "Operating Systems", 
The Systems Programming Series, copyright 1981 
by Addison- Wesley Publishing Co., chapter 14 de- 
scribing virtual storage, pp. 293-314. 

Referring now to Rg. 2, row (2), there Is shown 
one prior art technique for expediting the virtual-to- 
real address translation through the use of a 
"translation lookaside buffer" (TLB). The TLB 15 is 
formed from random access memory and is oper- 
ative as an LRU associative memory in which the 
address of data being accessed is performed in 
parallel with the instruction being decoded by the 
CPU. 

If a real cache 17 is placed ahead of real CPU 
main memory 19, as shown for instance in Fig. 2, 
row (3), then it has the advantage of storing pages 
with differ nt virtual addresses and pag s iocat d 
in diff r nt virtual addr ss spaces. Howev r. it suf- 
fers the disadvantag that cache accessing occurs 
only aft r th virtual-to-r al translation has b n 
p rformed. In a real cache, address translation is 
first perform d followed by a tabi lookup. 

Rg. 3 shows the plac ment of a virtual each 



21 prior to the address translation and real intemal 
storage 19, an an^angement embodying principles 
found in US Patent No, 4,612,612. identified above. 
As pointed out in that patent, at Coi. 2, lines 
5 43-49 

"The buffer typically contains a small fraction 
of the main store data at any time. In the virtually 
addressed buffer, the location of the data is not a 
function of main store real addresses, but is a 
70 function of the virtual addresses. Therefore, main 
store addresses do not map to unique buffer ad- 
dresses. More than one real address can be trans- 
lated to the same virtual address location in the 
buffer." 

IS The solution proposed in this patent is sum- 
marised at Col. 2, line 62, through Col. 3, line 2: 

"Since different virtual addresses may specify 
the same data location that corresponds to a single 
real address location in main-store, it is possible 

20 that the virtual-address buffer will store more than 
one copy, called a synonym, of the same data at 
different locations. For this reason, a reaHo-virtual 
translator translates main store real addresses to all 
buffer virtual addresses to locate buffer resident 

25 synonyms when modified data is stored Into the 
buffer." 

In the method of this invention, the cache is a 
software created and managed portion of internal 
storage. It serves as an external storage cache. In 

30 this regard, such a software cache is operatively 
different from the CPU cache hardware an^ange- 
ment described in the Woffinden patent. In Woffin- 
den's CPU cache, address resolution and access 
are in terms of microseconds, whereas resolution 

35 and access In the software external storage cache 
are In temns of milliseconds. This permits additional 
or refined processing. 

Refen-lng now to Rg. 4. there is depicted soft- 
ware caching and Its placement according to the 

40 invention. Two address translations or levels of 
indirection are shown. The first is the virtual or 
togical address 23 mapped into a virtual external 
storage address space (VESA) 25. while the sec- 
ond is VESA mapped into the real external storage 

46 address space 5. Access to the cache 27 is only 
by way of a VESA argument Cache 27 is posi- 
tioned subsequent to the first mapping and prior to 
the second. 

50 Avoidance of Synonymy Using VESA Ordered 
Pairs 

Th use of two levels of indirection In th 
method of this invention takes advantage of the 
55 natur of base plus displacement addressing as 
described in conn ction with demand paging and 
virtual addressing. In this r gard, suppose an ap- 
plication executing on CPU 1 specifies the 100th 
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relative page, if there are multipl versions of that 
page, then each version has the same logical ad- 
dress. These are different versions of the sam file. 

In this invention, the mapping from the name 
space to the intermediary space is many to one. 
Thus, two linear spaces sharing the same page 
would map to one single virtual external storage 
address (VESA) without synonym problems. The 
use of intermediate external storage avoids the 
synonym problem. 

Referring now to Fig. 5, there are set out two 
versions of the same file and the virtual external 
cache 27. Illustratively, the first file 29 bears the 
logical name RIe A Version 1 (AV1). It comprises 
original pages 0 and 1. The second file 31 bears 
the logical name RIe A Version 2 (AV2). AV2 
includes original page 0 and modified page 1 (page 
1'). The pages 0, 1, and V are mapped into the 
VESA addresses (so-called virtual frames) VFO, 
VF1, and VF2, respectively. Only one copy of page 
0 need be stored in cache 27. 

Updating In Place and Shadow Copying 

The method of the Invention provides that re- 
sponsive to a write in place of a page common to 
the original and updated files, the common pages 
are updated in place. This renders updated values 
available to both files (views). If the page to be 
written or updated Is not common, then a shadow 
copy is written thereof In cache assigning yet an- 
other logical external storage space thereto. 

Referring now to Rg. 6, there are shown 
changed files AV1 and AV2 and a different storage 
mix In cache 27. I^ore particularly, assume that 
AV1 includes an original page 0 and an original 
page 1. Also, assume that AV2 consists of an 
original page 0 and a modified page V, The man- 
ager for cache 27 assigns VESA address VFO to 
page 0. VF1 to page 1, and VF2' to page V, in the 
event an application updates page 0, then an up- 
date in place at VFO will occur because page 0 Is 
common to AVI and AV2. However, an update to 
page 1' will be processed by way of writing the 
changed page 1 " to a VESA address in the cache 
at VF2" and leaving the old page V as the shadow 
at cache VESA address VF2*. 

Algorithmic Expression of the Metiiod Using 
Another Exampie 

Referring now to Rg. 7, th re is shown the 
double mapping of pag s from fil s 1 and 2 to 
VESA-ord red cache to real intemal or external 
storage. Consider the following: 

Suppose the initial state of the system con- 
sisted of file 1 formed from pages 1 and 2. Also, 
pages 1 and 2 are mapped into VESA addresses 



VF12 and VF40. and then mapped into real ad- 
dresses R2 and R98. Next, assume that fil 2 was 
created initially as an image of file 1. The first 
concordance for pages 1 and 2 includes VF12 and 
5 VF40. The second concordance includes and 
R96. 

In order for page 2 of file 2 to become updated 
without sharing it with file 1, it is necessary to first 
allocate a new VESA, i.e., VR6 In cache 27. The 

10 concordance or page map for file 2 is then altered. 
After this, updated page 2* is written to cache 
location VF76. Real storage in the form of a DASD 
locafion, i.e., R102, is allocated and page 2' is 
copied therein. Parenthetically, VF40 remains the 

IS shadow location to VR6. 

If page 1 is rewritten, then no new allocation In 
cache 27 is needed because the page is shared 
between the files. The existing mapping to VF12 
remains the same and the updated page V Is 

20 written therein. Lil<ewise, the contents of VF12 are 
copied to DASD real location R2. 

If the changes to cache 27 can be batched to 
the point where the cache is filled, then the transfer 
to DASD real storage can be at one time. This 

25 yields a transfer efficiency when compared to mul- 
tiple discrete backing store updates. 

If file 1 is deleted before it is written (e.g., if it 
is a temporary file), then none of its constituent 
virtual frames are ever allocated in real storage. 

30 Allocation to real storage is only required when 
frames are actually written: their existence in the 
cache does not require this. 

Claims 

35 

1. A method for accessing data in a data pro- 
cessing system having a processor, Internal 
storage organised as a data cache formed 
from addressable pages, and external storage 

40 addressable to access multiple pages asso- 
ciated in files, the method comprising the 
steps of 

referencing pages in a given file according 
45 to their addresses In a linear space as mapped 
into a virtual external storage address (VESA) 
and then as mapped Into a physical address in 
external storage, and writing referenced pages 
into the cache using their VESA addresses as 
60 indexing arguments If not otherwise located in 

said cache, and, in response to a write request 
from said processing updating in plac those 
cached pages common to two fil s. oth rwise 
shadow copying updated pages into another 
55 cache location using another VESA address. 

2. A method as claimed In claim 1, including the 
step of writing the pages out from the data 
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cache to physical addresses in external stor- 
age only upon change. 

3. A method as clainaed in claim 1 or claim 2 in 
which the validity of a page In the cache Is 5 
maintained as long as the VESA address re- 
mains unchanged. 

4. A data processing system comprising a pro-, 
cesser (1) having internal storage (3) formed to 
from RAM-addressable pages and extemaJ 
storage (5) formed from DASD-addressabie 
pages, characterised by a cache (2) adapted to 
assign device independent location in a logical 
external storage space (VFO, VF1) to pages is 
(AV1 PC. AVI PI) to be accessed. 

to assign, in response to an update 
(AV2P0. AVaPI*) of a page not common to two 
files of associated pages, a further logical ex- 20 
ternal storage space (VF2 for AV2PV) and. in 



response to an update of a page common 
to two files, Id update such page in place in 
the cache, or otherwise to write a shadow copy 2s 
thereof assigning yet another logical external 
storage space (VF2') thereto. 

5. A system as claimed in claim 4. wherein said 
cache is formed from non-volatile storage and 30 
is further adapted, in response to an indication 

that the cache is full, to allocate space in 
external storage to updated pages, copy up- 
dated pages to that space, and form a concor- 
dance between the logical external space loca- 35 
tion and the physical location in extemal stor- 
age. 

6. A system as claimed in claim 5. in which the 
cach^ is formed from bistable remnant mag- 40 
netic material or bistable battery backed elec- 
trostatic material. 
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0 Data In pages is mapped into a very large virtual 
external address space (25) through a cache without 
disturbing the logical view of the data and without 
having to assign physical or real backing store to 
said logical view. A data cache (27) is used in which 
pages are Indexed according to a logical address 



(23) intermediate to their virtual address and their 

physical location in external storage (5). Pages com- 
mon to two or more files are updated in place in the 
cache, while pages bound to only one file are shad- 
ow copied. 
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